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ABSTRACT 

The B3 DNA-binding domains (DBDs) of plant tran- 
scription factors (TF) and DBDs of EcoRII and Bfil 
restriction endonucleases (EcoRII-N and Bfil-C) 
share a common structural fold, classified as the 
DNA-binding pseudobarrel. The B3 DBDs in the 
plant TFs recognize a diverse set of target se- 
quences. The only available co-crystal structure of 
the B3-like DBD is that of EcoRII-N (recognition 
sequence 5 -CCTGG-3 ). In order to understand the 
structural and molecular mechanisms of specificity 
of B3 DBDs, we have solved the crystal structure of 
Bfil-C (recognition sequence 5-ACTGGG-3') com- 
plexed with 12-bp cognate oligoduplex. Structural 
comparison of Bfil-C-DNA and EcoRII-N-DNA 
complexes reveals a conserved DNA-binding mode 
and a conserved pattern of interactions with the 
phosphodiester backbone. The determinants of the 
target specificity are located in the loops that 
emanate from the conserved structural core. The 
Bfil-C-DNA structure presented here expands a 
range of templates for modeling of the DNA-bound 
complexes of the B3 family of plant TFs. 

INTRODUCTION 

Structural studies of plant transcription factors (TFs) and 
type II restriction endonucleases (REases) revealed an un- 
expected link between these seemingly unrelated protein 
families. Despite the lack of sequence similarity, the B3 
DNA-binding domain (DBD) of plant TFs (1,2), the 



effector domain of EcoRII REase [EcoRII-N (3)] and 
the DBD of Bfil REase [Bfil-C (4)] share a common 
fold, classified as the DNA-binding pseudobarrel (SCOP 
number 101935). The conserved core of these proteins, 
referred here as 'B3-like domains', is comprised of a 
barrel-shaped 7-stranded (3-sheet capped by a-helices at 
both ends (5,6) (Figure 1). The genes encoding B3-like 
domains are widespread in plant kingdom from green 
algae to flowering plants. For example, in Arabidopsis 
alone there are 118 B3 domain containing proteins that 
often are involved in the hormone response pathways (1). 

An atomic view of DNA recognition by the B3-like 
proteins was provided by the crystal structure of 
EcoRII-N-DNA complex (7). In EcoRII-N the (32, (35 
and (34 (3-strands, located at one edge of the pseudobarrel, 
the oc-helix oc2, and the connecting loops make a wrench- 
like cleft that approaches DNA from the major groove 
side (7). The DNA-binding interface contains two 'recog- 
nition arms' which interact with different parts of the 
5'-CCTGG-3' target site: the N-arm (strand (32 and helix 
a2) contacts the 5'-terminal part of the recognition 
sequence, while the C-arm (strands (35 and (34) interacts 
with the 3'-terminal part of the recognition site. Structural 
comparison of other B3-like domains in the DNA-free 
form indicates a conserved structural core and a wrench- 
like DNA-binding cleft; moreover, a number of positively 
charged EcoRII residues involved in DNA-backbone 
interactions are conserved in this cleft (4,7). Taken 
together, these findings suggest that the DNA-binding 
mode identified in EcoRII-N-DNA crystal structure is 
conserved in other B3-like domains (7). Analysis of the 
DNA-binding interface of the Arabidopsis thaliana 
B3-family TFs RAVI (8) and VRN1 (9) supports this 
hypothesis. 
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Figure 1. B3 and B3-like DBDs of the pseudobarrel fold (SCOP number 101935). (A) Apo DBD (Bfil-C) of the Bfil restriction enzyme [residues 
193-358 of B chain, PDB ID 2C1L (4)]. (B) The effector domain of the EcoRII restriction enzyme (EcoRII-N) in the DNA-bound form [PDB ID 
3HQF, (7)]. (C) The NMR structure of the RAV1-B3 domain [model 1 in PDB ID 1WID (5)]. Common structural core made of 7 P-strands is 
colored in magenta, DNA backbone in (B) is depicted as a black double-helix. 



The B3-like domains display remarkable plasticity 
in terms of the target specificity: they interact with 
non-specific, pentanucleotide and hexanucleotide DNA 
sequences (Table 1). The crystal structure of EcoRII-N 
provided a structural mechanism for the pentanucleotide 
sequence recognition, however it remained to be estab- 
lished how B3-like domains adapt a conserved structural 
fold for the recognition of different DNA sites. In order to 
understand the structural and molecular mechanisms of 
specificity of B3 DBDs, we have solved the crystal struc- 
ture of the C-terminal domain (Bfil-C) of Bfil REase 
from Bacillus firmus complexed with a 12-bp cognate 
oligoduplex. Bfil-C belongs to the B3-like domain family 
and interacts with the hexanucleotide sequence 
5'-ACTGGG-3', which partially overlaps with the 
EcoRII-N site 5 / -C CTGG -3 / (the overlapping base pairs 
are underlined). Comparison of Bfil-C-DNA and EcoRII- 
N-DNA complexes revealed a structurally conserved 
pattern of phosphodiester backbone contacts and 
conserved amino acid residues in the DNA-binding cleft. 
Taken together, the Bfil-C-DNA structure presented here 
reveals how a conserved structural core is adapted for the 
recognition of variable DNA sequences and expands 
the template range for modeling of the DNA-bound 
complexes of the B3 family of plant TFs. 

MATERIALS AND METHODS 

DNA oligoduplexes 

HPLC-grade oligonucleotides were purchased from 
Metabion (Martinsried, Germany). The oligoduplex 12/12 
(top/bottom strand sequences 5'-AGCACTGGGTCG-37 
3'-TCGTGACCCAGC-5', respectively, the Bfil site 
underlined) for generation of the minimal Bfil-C-DNA 
complex was assembled as described in (7). Prior to anneal- 
ing, the bottom strands of oligoduplexes 16SP (5'-AGCGT 
AGC ACTGGG CT-373 , -TCGCATCG TGACCC GA-5 , ) 
and 16NS (5'-AGCGTAGCCCGGGGCT-3'/3'-TCGCAT 



Table 1. B3 and B3-like proteins and their recognition sequences 



Protein 


Recognition site 


Reference 


VRN1-B3 


non-specific 


(9) 


RAV1-B3 


5'-CACCTG-3' 


(10) 


ARF1-B3 


5'-TGTCTC-3' 


(11) 


ABI3-B3 


5'-CATGCA-3' 


(12) 


EcoRII-N 


5'-CCTGG-3' 


(7,13) 


Bfil-C 


5'-ACTGGG-3' 


(14) 



CGGGCCCCGA-5') used for DNA-binding experiments 
were radiolabeled using [y- 33 P]ATP (Hartmann Analytic, 
Braunschweig, Germany) and T4 polynucleotide kinase 
(Thermo Fisher Scientific, Vilnius, Lithuania). 

Preparation of the Bfil-C domain 

Bfil mutant K107A was expressed in Escherichia coli 
strain ER2566 carrying plasmids pET21b-BfiIR6.5- 
K107A and pBfiIM9.1 and purified as described previ- 
ously (15). Protein concentration was determined by 
measuring the absorbance at 280 nm and are expressed 
in terms of dimer using the extinction coefficient of 
99700/M/cm (calculated using the ProtParam tool, 
http://web.expasy.org/protparam/). The Bfil-C-DNA 
complex was obtained by limited proteolysis of the full- 
length Bfil-DNA complex. The full-length K107A Bfil 
mutant was mixed with oligoduplex 12/12 at the molar 
protein dimer/DNA ratio 1:2.2 [the stoichiometry of the 
Bfil-DNA complex is 1:2 (15)] in a buffer containing 
10 mM Tris-HCl (pH 7.5 at 25°C), 200 mM KC1, ImM 
DTT, 2mM calcium acetate and 10% glycerol. 
Thermolysin was added at 1:100 w/w protease/protein 
ratio. Limited proteolysis was performed at 37°C for 
20 h and reaction terminated as described in (14). The 
reaction mixture was diluted two-fold with lOmM Tris- 
HCl (pH 7.5 at 25° C) to reduce the salt concentration 
and loaded onto a 1-ml Heparin Sepharose column 
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(GE Healthcare). The Bfil-C-DNA complex was collected 
as the flowthrough fraction and was subsequently loaded 
onto Mono Q 5/50 GL column (Amersham Pharmacia 
Biotech) and eluted using KC1 gradient in 20 mM Tris- 
HC1 (pH 7.5 at 25°C), 1 mM EDTA and 10% glycerol. All 
purification steps were carried out at room temperature. 
Fractions containing the Bfil-C-DNA complex were 
dialyzed against 10 mM Tris-HCl (pH 7.5 at 25°C), 
100 mM KC1, ImM DTT, 10% glycerol and stored 
at 4°C. Concentration of Bfil-C in the complex was 
determined by Bradford assay using the apo Bfil-C 
domain that was purified previously (14) as a reference. 

Crystallization, data collection and 
structure determination 

Bfil-C complex with oligoduplex 12/12 was concentrated 
to 1.7mg/ml using a Centricon concentrator (Millipore) 
with 3 kDa MW cut-off. Crystals were grown by the 
sitting drop vapor diffusion method at 19°C by mixing 
0.5 ul of the complex solution with 0.5 ul of the crystalliza- 
tion buffer #18 of the 'Index' screen (Hampton Research), 
which contained 0.49 M NaH 2 P0 4 , 0.91 M K 2 HP0 4 (pH 
6.9 at 25°C). A crystal belonging to the spacegroup P65 
appeared after 3 years. Prior to flash-cooling, the crystal 
was transferred into a 3:1 mixture of the reservoir buffer 
with glycerol. X-ray diffraction data were collected using 
an in-house Rigaku RU-H3R rotating anode generator. 
Images were processed using MOSFLM (16,17) and 
SCALA (18). Initial phases were obtained by the molecu- 
lar replacement using Bfil-C residues 200-347 from the 
full-length apo Bfil structure [PDB ID 2C1L (4)]. 
Structure was initially refined using REFMAC (19), the 
final refinement step was performed with PHENIX (20). 
COOT (21) was used for a model inspection. Data collec- 
tion and refinement statistics are presented in Table 2. 

After molecular replacement several DNA phosphates 
were immediately visible in the electron density map. After 
improvement of the protein model, we were able to build 
all 12 bp of the duplex [atomic DNA coordinates were 
generated by 3D-DART (22)]. However, it was still not 
possible to distinguish between purines and pyrimidines 
and to determine the absolute orientation of DNA in 
the DNA-binding cleft (the asymmetric nature of the rec- 
ognition sequence rules out overlapping of alternative 
DNA orientations). To solve this problem, we produced 
two models of Bfil-C with alternative DNA orientations. 
Both models were subjected to 20 cycles of positional re- 
finement in REFMAC (19), and the model having better 
refinement statistics (i?f ree = 0.26 versus i? free = 0.32 in the 
alternative model) was selected as the one having the 
correct duplex orientation. The final structure refined to 
an i? free = 0.226 (Table 2) has a clear electron density for 
each recognition sequence base pair and Pu/Py bases are 
easily distinguishable in both duplexes within the crystal- 
lographic asymmetric unit (Supplementary Figure SI). 

Coordinates and structure factors are deposited under 
PDB ID 3ZI5. DNA geometry in Bfil-C DNA structure 
was analysed with CURVES (23). The contact surfaces 
buried between the two molecules were calculated using 
NACCESS (24). Protein-DNA contacts were analysed by 



Table 2. Diffraction data and structure refinement statistics 



Data collection'' 
Spacegroup 
Unit cell 



Resolution, A (final shell) 
Reflections unique (total) 
Completeness (%) overall (final shell) 
I/oj overall (final shell) 
Emerge overall (final shell) b 
B(iso) from Wilson, (A 2 ) 
Refinement 0 

Resolution range (A) 
Number of protein atoms 
Number of DNA atoms 

^cryst (^free) 

RMS bonds (A)/angles (°) 
Average B factors (A 2 ), total 
Main chain 
Side chains 
Solvent 
Ramachandran plot 
Favored 
Allowed 
Outliers 



P6 5 

a = 175.18, b = 175.18, 
c = 35.79 A, 

a = y = 90.0°, p = 120° 
50.57-3.2 (3.37-3.2) 
10809 (75356) 
100 (100) 
13.3 (7.2) 
0.147 (0.240) 
24.4 

43.8-3.2 

2642 

972 

0.176 (0.226) 

0.003/0.743 

49.1 

48.9 

49.6 

28.8 

96.04% 

3.96% 

0.00% 



"Dataset was collected at 100K 

Emerge = E/,E£t«4>-/«) 2 /E*E£i 4 where is an intensity 
value of ;'-th measurement of reflection h, h = (h,k,l), sum E;, runs 
over all measured reflections, and (/;,) is an average measured inten- 
sity of the reflection h. Number n h is a number of measurements of 
reflection ft 
Test set size 9.6%. 



NUCPLOT (25). Graphical representations were 
produced with MolScript (26,27). 

Bfil mutants 

His-tagged full-length Bfil mutant variants were generated 
by QuickChange (28) or 'megaprimer 1 methods (29) using 
the plasmid pET15b-BfiIR6.5 as the template for PCR. 
The E. coli strain ER2267 carrying the plasmid 
pBfiIM9.1 was used as a transformation host. The muta- 
tions were confirmed by DNA sequencing of the entire 
gene. The proteins were purified using Ni 2+ -charged 
HiTrap HP and HiTrap Heparin HP columns (GE 
Healthcare) as described in (15) to >90% (~50% for the 
Y227A) homogeneity as estimated by SDS-PAGE. 
Concentrations of the WT and mutant proteins were 
determined as described above using extinction coeffi- 
cients for each mutant calculated by the ProtParam tool 
(http://web.expasy.org/protparam/) and are expressed in 
terms of dimer. The K340A mutant was not isolated due 
to a very low expression level. 

Electrophoretic mobility shift assay 

DNA binding by the WT Bfil and mutants was analysed 
by the electrophoretic mobility shift assay (EMSA) using 
the 16-bp specific and non-specific DNA duplexes 16SP 
and 16NS, respectively. DNA (final concentration 1 nM) 
was incubated with Bfil (final concentrations varied from 
1 to 1000 nM of dimer) for lOmin in 20 ul of the binding 
buffer containing 40 mM Tris-acetate (pH 8.3 at 25°C), 
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0.1 mM EDTA, 0.1 mg/ml BSA and 10% glycerol at room 
temperature. Free DNA and protein-DNA complexes 
were separated by electrophoresis using 6-8% acryl- 
amide gels (29:1 acrylamide/bisacrylamide in 40 mM 
Tris-acetate, pH 8.3 at 25°C, and 0.1 mM EDTA). 
Radiolabeled DNA was detected and quantified using 
the Cyclone phosphorimager and the OptiQuant 
software (Packard Instrument). The association constant 
K A (K A = l/K D , where K D is the dissociation constant) 
values were calculated as described previously (13). 

Activity measurements 

DNA cleavage activity of WT Bfil and mutants was 
measured by incubating different amounts of purified 
proteins (varied from 2 to 1000 nM) with 1 jxg of k DNA 
in 50 ul of the Y+/TangoTM buffer (Thermo Fisher 
Scientific) for 1 h at 37°C and analysing the DNA 
cleavage products by agarose gel electrophoresis. One 
unit of Bfil is defined as the smallest amount of protein 
that completely digests A. DNA. The specific activity of 
WT Bfil equals 100000 units/mg protein. To demonstrate 
that the decrease of catalytic activity results only from the 
point mutations in the C-domain, all mutant proteins 
were subjected to limited proteolysis with thermolysin, 
which liberates the protease-resistant catalytic 
N-terminal domain (14); the resultant N-terminal 
domains in all cases demonstrated the expected 
non-specific DNA cleavage. 

RESULTS 

Overall structure of the Bfil-C-DNA complex 

The asymmetric unit of the crystal contains two Bfil- 
C-DNA complexes (Supplementary Figure S2A). 
Despite the relatively large contact surface between the 
protein chains ( x 1000 A 2 ), this interaction seems to be bio- 
logically irrelevant, since isolated Bfil-C is a monomer 
that binds only one DNA copy (14). Both monomers in 
the asymmetric unit are nearly identical and can be 
superimposed with an RMSD of 0.1 3A including protein 
side chains. The DNA bound in the complex is B-DNA 
form. In the crystal the DNA oligoduplexes form a con- 
tinuous left-handed pseudohelix due to the stacking inter- 
actions between the symmetry-related oligoduplexes 
(Supplementary Figure S2B). The secondary structure of 
the DNA-bound Bfil-C domain is similar to that of apo- 
Bfil-C (PDB ID 2C1L, residues 193-358) with the only 
exception of the 3 10 helix (residues 232-237) that is 
missing in the apo-form. 

DNA recognition by Bfil-C 

The wrench-like DNA-binding cleft of Bfil-C is similar to 
that of EcoRII-N (7) (Figures 1A and B). The N-arm 
(residues 210-231) in the Bfil-C binding cleft includes 
a-helix a6, P-strand pll and the connecting loop, while 
the C-arm (residues 266-287) is comprised of anti-parallel 
P14-15 P-strands and the connecting loop (Figure 2A). In 
contrast to EcoRII-N, DNA contacts made by the Bfil-C 
extend beyond the N- and C-arms and include additional 



structural elements, namely, the N-loop connecting 
P-strands pi2 and pi3 (residues 245-252, Figure 2A), 
and the C-loop connecting the P-strands pi 8 and pl9 
(residues 335-341, Figure 2A). 

Bfil-C approaches DNA from the major groove and 
makes an extensive set of contacts with the DNA 
backbone (Supplementary Table SI and Figure S3 A). 
More specifically, a number of residues in the N-loop 
and the R272 residue in the C-arm make contacts to the 
phosphates on the top DNA strand of the target site, while 
the residues in the C-loop and within (or in the vicinity) of 
the N-arm interact with the phosphates on the bottom 
strand. 

The sequence-specific Bfil-C-DNA interactions are 
provided by the amino acid residues located in the 
N- and C-arms. Residues in the N-arm are involved in 
hydrogen bonding interactions with the bases at the 
5'-end of the site (green box in Figure 2B), whereas the 
residues in the C-arm make specific contacts with the bases 
at the 3'-end of the site (orange box in Figure 2B). In total, 
Bfil-C employs nine amino acid residues to make H-bonds 
and vdW contacts to the bases of the recognition site 
in the major groove (Figure 2C), and uses a sole R247 
residue located in the N-loop for minor groove 
interactions. 

Comparison of apo- and DNA-bound forms of Bfil-C 

The conformational changes of Bfil-C occurring upon 
DNA binding are limited to the N-arm, N-loop and 
C-arm regions (Supplementary Figure S4). The loop in 
the N-arm (residues 216-223) moves towards the DNA 
backbone and brings the DNA-recognition residue T225 
closer to the first two base pairs of the recognition site 
(Figure 2C). Upon DNA-binding unstructured region 
(amino acid residues 232-237) folds into a 3[ 0 helix and 
the oc-helix a6 (residues 208-215) moves closer to the DNA 
backbone implying a favorable dipole interaction 
(Supplementary Figure S4). 

The P-strands pi 4- 15 of the C-arm move in respect to 
the P-barrel core by pulling away from the pi 8 strand. 
This conformational change brings the N280 residue 
~5A closer to the DNA bases and enables a base- 
specific contact to the penultimate C base. Significant con- 
formational change upon DNA binding occurs also in the 
N-loop (residues 245-252), which is shifted towards DNA 
along with the C-arm (Supplementary Figure S4). As a 
result, the R247 residue moves ~7.5A towards DNA 
making a H-bond to the 5'-terminal adenine from the 
minor groove side (Figure 2C). 

Mutational analysis of DNA-contacting residues 

Bfil-C-DNA co-crystal structure was solved at the 
relatively low 3.2A resolution, therefore DNA-binding 
residues predicted by the structural analysis were sub- 
jected to the site-directed mutagenesis to verify their func- 
tional importance (Table 3). Three sets of amino acid 
residues were analysed: (i) residues that make direct 
H-bonds or non-polar contacts to the bases of the recog- 
nition site in the major and the minor grooves (Figure 2C); 
(ii) positively charged amino acid residues and N245 that 
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N-arm 




90' 



N-loop 




12345678 

B 5'-A G C 



3'-T C G 



A C|LT G G G 
T G A| C C C 



T C G-3' 
A G C-5' 




-4 -3 

5'-G C 
3'-C G 



-1 o 1 



3 4 



CC[TG G C G-3' 
G G aUc c| g C-5' 




Figure 2. DNA recognition by Bfil-C and EcoRII-N. (A) The view of the Bfil-C-DNA complex along the long DNA axis (left) and the side view 
(right). The DNA-recognition site is colored dark grey. The secondary structure elements of the N-arm (a-helix oc6, P-strand (311) and the C-arm 
(p-strands pi4 and pi5) are colored green and orange, respectively. Spheres of the matching colors represent the Coc atoms of the DNA-recognition 
residues from the N- and C-arms. The additional DNA-recognition element N-loop (residues 245-252) is colored blue and the C-loop (residues 
335-341) is red. A region of the top DNA strand (nucleotides A4-G7) and adjacent recognition residues are shown against their mF 0 -DFc 
SIGMAA-weighted-electron density contoured at 2.0 a level. (B) The sequence and numbering of the cognate 12/12 oligoduplex used in this 
study. DNA bases that interact with the N- and C-arms are boxed in green and orange, respectively. (C) Recognition of individual base pairs by 
Bfil-C. Panels for the individual base pairs are arranged following the top strand ACTGGG in the 5'->- 3' direction. The N- and C-arm residue labels 
are colored as in panel (A). (D and E) The sequence and numbering of the cognate EcoRII-N oligoduplex and the recognition of individual base 
pairs by EcoRII-N [PDB ID 3HQF (7)]. The residue labels and boxes encircling EcoRII-N sequence elements are colored as in panels (B and C). 



interact with the DNA phosphates (Supplementary 
Table SI); (hi) residues T252 and Q226 that are conserved 
in the BhI family enzymes (Supplementary Figure S5) and 
overlap with EcoRII-N residues Q37 and N61 involved in 
DNA binding. The selected residues were replaced by 
alanine in the context of full-length Bfil, mutant proteins 
were isolated and DNA binding and cleavage abilities 



evaluated by EMSA (representative data provided in 
Supplementary Figure S6) and X DNA cleavage assay, 
respectively. Mutations with respect to their effect on 
DNA binding and cleavage clustered into three groups: 
(i) 'unimportant' mutations that lead to decrease of the 
cognate DNA-binding affinity (K A ) or specific cleavage 
activity up to 10-fold; (ii) 'important' mutations that 
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Table 3. Mutational analysis of Bfil-C residues 



Mutated residue Location Contact with DNA DNA-binding Impact on DNA Specific Impact on 

ability (%) a binding activity, (%) b DNA cleavage 



Direct contacts to DNA bases 



KzlZ 


N-arm 


H-Donas lo Us (bottom) ana A/ 


i 


Important 


.-A C 
<0. J 


Essential 




N-arm 


vdW contact to T9 


1 AA 


Unimportant 


1 AA 
100 


Unimportant 


1 2.2.J 


N-arm 


H-Donas to A4 ana us (Dottom^ 


1 A 
1 It 


Important 


1 A 
1 U 


Important 


Y227 


N-arm 


H-bond to C5 (top) 


5 


Important 


<0.5 


Essential 


W229 


N-arm 


vdW contact to C6 


1 


Essential 


3 


Important 


R247 


N-loop 


H-bond to A4 


<0.4 


Essential 


20 


Unimportant 


E276 


C-arm 


vdW contact to T6 


10 


Important 


1 


Essential 


N279 


C-arm 


H-bond to G7 


<0.2 


Essential 


<0.5 


Essential 


N280 


C-arm 


H-bonds to G8 (top) and C4 


<0.2 


Essential 


<0.5 


Essential 


D282 


C-arm 


H-bonds to C5 (bottom) and C6 


<0.2 


Essential 


<0.5 


Essential 


R284 


C-arm 


H-bonds to T6 and G7 


<0.2 


Essential 


<1 


Essential 


ontacts 


with the DNA phosphates 












N245 


N-loop 


5'-ApCTGGG-3' 


100 


Unimportant 


20 


Unimportant 


K250 


N-loop 


5'-ACTpGGG-3' 


2 


Important 


10 


Important 


R272 


C-arm 


5'-pACTGGG-3' 


1 


Essential 


1 


Essential 


R291 


Close to C-arm 


5'-pNACTGGG-3' 


2 


Important 


5 


Important 


K340 c 


C-loop 


3'-TGACCpC-5' 


n.d. d 


n.d. 


n.d. 


n.d. 


ither residues conserved in Bfil-C-related proteins 










Q226 


N-arm 




50 


unimportant 


20 


unimportant 


T252 


N-loop 




n.d. 


n.d. 


100 


unimportant 



"The DNA-binding ability is expressed in percent (%) as the ratio of the protein-DNA association constant K A of full-length Bfil mutants relative to 
the K A of WT Bfil dimer [(5 x 10 S )/M] 

b The specific activity of Bfil mutants is expressed in percent (%) relative to the activity of WT Bfil 
TJ>ue to very low expression level, the Bfil K340A mutant could not be purified. 
d n.d., not determined. 



lead to decrease of the cognate DNA-binding affinity (K A ) 
or specific cleavage activity by 1-2 orders of magnitude; 
and (iii) 'essential' mutations, that lead to decrease of the 
cognate DNA-binding affinity (K A ) or specific cleavage 
activity by more than two orders of magnitude (Table 3). 

Nearly all N- and C-arm residues that make direct 
contacts to the DNA bases are either essential or import- 
ant (Figure 2C and Table 3) and also show a high degree 
of conservation in the Bfil family (Supplementary 
Figure S5). Two of these residues, namely Y227 and 
D282, are involved in H-bond interactions with the N 4 
amino groups of cytosines that become methylated 
by the Bfil methyltransferases [Figure 2C, (30)]. It 
seems that disruption of these interactions by the 
N 4 -methylation or mutations (Y227A or D282A) com- 
pletely abolishes the Bfil cleavage. The only discrepancy 
between DNA binding and cleavage data is observed 
for the R247A mutation: it is essential for binding, but 
unimportant for cleavage. Presumably, removal of this 
positively charged residue destabilized the Bfil-DNA 
complex to such an extent that it could no longer be 
resolved in the non-equilibrium EMSA experiment, but 
the complex was sufficiently long-lived to enable DNA 
cleavage under the X DNA cleavage assay conditions. 
A number of residues making contacts with the DNA 
backbone are also important for Bfil function, the most 
critical being the highly conserved C-arm residue R272 
that makes H-bond to the 5 -pACTGGG-3' phosphate 
(Figure 3). In contrast, the Q223 residue from the 
N-arm is unimportant for the Bfil function, implying 
that it does not make any contact to the methyl group 
of the T base in the bottom DNA strand (Figure 2C). 



Residues Q226 and T252, though they do have structural 
counterparts in Bfil family enzymes and EcoRII, are not 
important for Bfil (Table 3). 



DISCUSSION 

Close structural similarity between B3 domains of plant 
TFs and the effector domain of EcoRII REase suggests 
that B3 domains either diverged from a common ancestor, 
or were horizontally transferred into higher plants from 
symbiotic or pathogenic bacteria (1,2,33). Target sites for 
plant TFs of the B3 superfamily vary both in length and 
sequence (Table 1), suggesting that the DNA-binding 
pseudobarrel fold exhibits a structural plasticity that 
enables recognition of different DNA sequences within a 
conserved structural core. The molecular mechanisms of 
the target site specificity of the plant TFs yet has to be 
established since 3D structures of plant B3 domains are 
solved in the absence of DNA [PDB IDs 1WID (5), 4I1K 
(9), 1YEL (6)]. The crystal structure of the EcoRII effector 
domain bound to the oligoduplex containing a 5-bp target 
sequence provided a first structural template for modeling 
of DNA complexes of plant TFs (7,9). The Bfil-C DNA 
structure presented here reveals for the first time the struc- 
tural mechanism of the B3-like domain adaption for a 
different DNA target and provides a new template for 
the modeling of plant TFs. 

Conserved DNA orientation in the B3-like domains 

The fold of B3-like domains is termed a 'pseudobarrel' 
because of the missing connectivity between two strands 
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Figure 3. Sequence and structure elements involved in protein-DNA interactions in B3-like domains. (A) Structure-based multiple sequence align- 
ment of Bfil-C (PDB ID 2C1L, chain A), EcoRII-N (PDB ID 3HQF), RAV1-B3 (PDB ID 1WID, model 1), VRN1-B3 (PDB ID 4I1K, chain A) and 
Atlgl6640-B3 (PDB ID 1YEL, model 1) was generated by MultiProt and Staccato (31). Residues and secondary structure elements are numbered 
according to the Bfil-C-DNA structure. Bfil-C DNA-binding elements: N-arm, N-loop, C-arm and C-loop are marked by green, blue, orange and 
red stripes, respectively. The (D/E)XR motif of Bfil-C and EcoRII-N responsible for recognition of the 5'-TGG-3' trinucleotide is marked by black 
boxes and black circles. Residues contacting the 'clamp' phosphates are marked by magenta boxes and asterisks. The figure was generated using 
ESPRIPT (32). (B) Interaction of B3-like domains with DNA: Bfil-C (this study), EcoRII-N [PDB ID 3HQF, (7)] and RAV1-B3 [PDB ID 1WID, 
interactions with DNA according to Yamasaki el al. (5,8)]. The bases in the recognition sites are colored yellow; orange, green, blue and red islands 
encircle residues from the N-arm, C-arm, N-loop and C-loop, respectively. Residues contacting DNA phosphate oxygen atoms are depicted in the 
proximity of the corresponding phosphates. 'Clamp' phosphates are colored in cyan. 
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(P10 and pi 1 in Bfil-C and pi and P2 in EcoRII-N) in the 
otherwise barrel-shaped 7-stranded P-sheet, which in the 
case of Bfil-C and EcoRII-N overlap with a RMSD <2A 
for 75 Ca atoms [calculated using MultiProt (31)]. The 
'open' edge of the P-sheet forms the concave wrench-like 
surface that fits into the DNA major groove and provides 
base-specific contacts. The spatial positioning of the 
'pseudobarref in respect to the DNA long-helical axis 
in the Bfil-C-DNA and EcoRlI-N-DNA structures is 
conserved (Supplementary Figure S3A and B), suggesting 
a similar DNA orientation in respect to the protein core 
for other B3-like and B3 domains, including RAV1-B3 
[PDB ID 1WID, (5)], Atlgl4660-B3 [PDB ID 1YEL, 
(6)] and VRN1-B3 [PDB ID 4I1K, (9)]. A recent high- 
resolution crystal structure and DNA-binding analysis 
of plant TF VRN1 are consistent with a similar DNA- 
binding mode (9). 

Plasticity of the DNA-binding interface of the B3-like 
domains enables recognition of different sequences 

The crystal structure of Bfil-C presented here and EcoRII- 
N structure solved by us earlier (7) provide a unique 
opportunity to compare structural and molecular mechan- 
isms of sequence recognition by two B3-like proteins inter- 
acting with target sites of different length and sequence. 
Bfil-C is specific for the asymmetric 6-bp sequence 
5'-ACTGGG-3', while EcoRII-N recognizes a partially 
overlapping pseudopalindromic 5-bp sequence 
5'-CCWGG-3' (W = A or T, the overlapping bases are 
underlined). Most of the sequence specific and DNA 
backbone contacts in Bfil-C and EcoRII-N are made by 
residues located in the loops that extend from the concave 
P-sheet. Bfil-C and EcoRII-N share two conserved struc- 
tural elements called N- and C-arms that make base- 
specific contacts to DNA in both proteins. The N-arms 
of both proteins contribute amino acid residues for the spe- 
cific interactions with bases at the 5'-half of the recogni- 
tion site, while the residues located in the C-arm contacts 
the bases in the 3'-half of the target site (Figure 2B and D). 
Intriguingly, Bfil-C and EcoRII-N bind their partially 
overlapping recognition sites in the same orientation. 
Moreover, though Bfil-C and EcoRII-N recognize the 
first C:G base in the overlapping region differently, both 
proteins make a similar set of contacts to the 5 / - TGG-3 / 
trinucleotide. For example, both proteins use a structur- 
ally conservative C-arm arginine (R284 in Bfil-C, R98 in 
EcoRII-N) to make a H-bond to the 04 atom of the 
T base, and a conserved carboxylate (D282 in Bfil-C, 
E96 in EcoRII-N) to make a H-bond to the N 4 atom of 
the cytosine in the last G:C base pair (Figure 2C and E). 

Most of the sequence-specific contacts in Bfil are 
provided by amino acid residues located in the N- and 
C-arms. Interestingly, the N-arm of Bfil-C is 8 and 
11-13 amino acid residues longer than structural equiva- 
lents in EcoRII-N and other B3-like domains, respectively 
(Figure 3A). Despite of the length differences, the N-arm 
of Bfil-C makes a similar number of base-specific contacts 
as the N-arm of EcoRII-N (Figure 3B). The extra part of 
the loop connecting the a-helix a6 and P-strand pi 1 in the 
Bfil-C N-arm points away from DNA and does not 



contribute to the interactions with DNA. Presumably, 
the length of the N-arm in B3 domains is also sufficient 
to secure base-specific contacts to DNA (Figure 3). 

Bfil-C also contains additional structural elements, 
namely N- and C-loops, that are involved in interactions 
with DNA. Amino acid residues in the N-loop contribute 
to the base-specific interactions at the 5'-end of the tar- 
get site while amino acid residues in the C-loop make 
DNA-backbone contacts at the 3'-end of the recogni- 
tion sequence. The protein-DNA interface area in the 
Bfil-C-DNA complex therefore is larger than in the 
EcoRII-N-DNA complex [2800 A 2 versus 2200 A 2 (7)]. 
The structural equivalents of the N- and C-loops in 
EcoRII-N, and the N-loop in B3 domains from plant 
TFs are shorter than in Bfil-C and therefore cannot 
make direct contacts to DNA (Figure 3A). 

The Bfil-C recognition sequence is extended to 6 bp by 
an extra G:C base pair at the 3'-terminus in the respect to 
the 5-bp EcoRII-N target. Surprisingly, Bfil-C recognizes 
the extended recognition site using a 5 residues shorter and 
more compact C-arm (Figure 3A) than EcoRII-N. The 
side chains of DNA-facing amino acids on the C-arm 
loop are shorter in Bfil-C than structurally equivalent 
residues in EcoRII-N (D282 and N280 in Bfil-C versus 
E96 and R94 in EcoRII-N, Figure 4). Bfil-C positions 
this compact cluster of amino acids close to the bases at 
the 3'-end of the recognition site 5'-ACTGGG-3' and 
makes direct contacts to all three 3'-terminal G:C base 
pairs (Figures 3B and 4). In contrast, equivalent residues 
in EcoRII-N (R94 and E96) make direct contacts only 
to the terminal G:C base pair (underlined) within the 
5'-CCTGG-3' target sequence. In summary, Bfil-C 
trades two non-specific contacts made by the C-arm of 
EcoRII-N for a direct base-specific H-bond (N280 main 
chain oxygen to the N 4 atom of the terminal C base in 
the bottom-strand). The loss of the C-arm and 
DNA-backbone contacts in Bfil-C is compensated by an 
increased number of non-specific contacts coming from 
other structural elements (Figure 3 and Supplementary 
Table SI). The C-arm in RAV1-B3 is only 1 amino acid 
shorter than in Bfil-C and would be consistent with 
specific binding of RAV1-B3 to a 6-bp recognition site 
(Figure 3). On the other hand, the non-specific DNA- 
binding protein VRN1-B3 (9) has a C-arm that is 3 
residues shorter than in Bfil-C. It would be interesting 
to see whether the length of N- and C-arms of B3-like 
domains correlate with the DNA-binding specificity and 
recognition sequence length. 

Conservation of DNA-backbone contacts 

Analysis the DNA-backbone contacts in the Bfil-C and 
EcoRII-N complexes revealed conserved interactions with 
phosphate groups, referred here as 'clamp' phosphates, at 
the 5'-ends of the top and the bottom DNA strands 
(Figure 3B and Supplementary Figure S3). The C-arm 
of Bfil-C is fixed to the 5'-pACTGGG-3' phosphate 
group via a positively charged side chain of R272 
residue (Figure 3B), that is spatially equivalent to the 
EcoRII-N R81 residue that interacts with the top strand 
phosphate 5'-pCCTGG-3'. The impaired DNA-binding 
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Figure 4. Recognition of the 3'-terminal nucleotides by Bfil-C and EcoRII-N. The C-arms of Bfil-C and EcoRII-N are orange and pink, respect- 
ively, the top DNA strand is white, the bottom DNA strand is grey. Only the last 3 bp of the Bfil recognition site and the overlapping base pairs 
from the EcoRII-N-DNA structure (PDB ID 3HQF) are shown. 



affinity of the R272A Bfil mutant (Table 3), and spatial 
conservation of this arginine residue in all available B3- 
like domain structures (Figure 3A) underscores import- 
ance of this backbone contact for all B3-like domains. 
Bfil lacks the direct structural equivalent to the EcoRII- 
N residue K23, which binds the bottom strand phosphate 
3'-GGACCp-5' and is conserved in RAV1-B3, Atlgl6440- 
B3 (Figure 3A) and many other B3 domains (5). However, 
a similar DNA-backbone contact to the clamp phosphate 
3'-TGACCpC-5' on the bottom strand is provided by the 
Bfil-C residue K340, which resides on a different struc- 
tural element (Figure 3 and Supplementary Figure S3). 
We propose that interactions of the conserved positively 
charged residues with 'clamp' phosphates on the top and 
bottom DNA strands help to position the DNA molecule 
in the binding cleft in the conserved orientation and 
promote formation of base-specific contacts. This obser- 
vation is in line with the recent analysis of crystal struc- 
tures of DNA-binding protein complexes (34). 

CONCLUSIONS 

The crystal structure of the Bfil REase DBD (Bfil-C) in 
complex with DNA provides a first glimpse into the mech- 
anism of 6-bp sequence recognition by a B3-like protein. 
The Bfil-C-DNA structure confirms the conserved DNA- 
binding mode inferred previously for B3-like domains (7) 
and reveals a conserved set of non-specific and specific 
interactions with DNA. Two positively charged Bfil-C 
residues, namely K340 and R272, make contacts to the 
'clamp' DNA phosphates at the opposite termini of the 
DNA-recognition site, thereby anchoring the wrench-like 
DNA-binding surface in the DNA major groove. Spatial 
conservation of these residues in majority of B3 do- 
mains implies a similar DNA 'clamping' mechanism. 
Furthermore, both Bfil-C and EcoRII-N use N- and 
C-arms for making base-specific contacts to their 6- and 
5-bp recognition sequences, respectively. The amino acid 



residues located in the loops within the C-arms determine 
the length and specificity of the target site. The loops in the 
N- and C-arms of plant B3 domains show great variability 
in length and sequence consistent with the diversity of their 
DNA-recognition sequences. The Bfil-C-DNA structure 
presented here opens the way for modeling of DNA- 
bound B3 domains of plant TFs using a novel template. 
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